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Abstract 

We have been constructed a brand-new radiation hydrodynamics solver based upon Smoothed Particle 
Hydrodynamics (SPH), which works on parallel computer system. The code is designed to investigate the 
formation and evolution of the first generation objects at z > 10, where the radiative feedback from various 
sources play important roles. The code can compute the fraction of chemical species c, H+, H. H~, H 2 , and 
H^" by fully implicit time integration. It also can deal with multiple sources of ionizing radiation, as well as 
the radiation at Lyman- Werner band. We compare the results for a few test calculations with the results 
of one dimensional simulations, in which we find good agreements with each other. We also evaluate the 
speedup by parallelization, that is found to be almost ideal, as far as the number of sources is comparable 
to the number of processors. 
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Introduction 



> 

One of the important objectives of cosmology today is to understand the way how the first generation stars or 
galaxies are formed and how they affected the later structure formation, and how they reionized the surrounding 
\& , media. These problems have been studied intensively in the last decade, hence we already have some knowledge on 
■ these issues. However, studies which properly address the radiation transfer effects are still at the beginning, in spite 
(-h \ of their great importance. 

In order to investigate the radiation transfer effects on such issues in realistic cosmological density field, we need 
the code which can deal with hydrodynamics, gravity, chemical reactions, and naturally radiation transfer in three 
dimension. So far, various kinds of numerical approaches have been taken by several authors at different levels. There 
are several codes which focusing on the fragmentation of primordial gas (Abel, Bryan, Norman 2000; Bromm, Coppi 
& Larson 1999; Bromm, Coppi & Larson 2002; Nakamura & Umemura 1999; Nakamura & Umemura 2001; Yoshida 
et al. 2003), without radiation transfer. There also are the radiation transfer codes in which hydrodynamics are 
not coupled (Abel, Norman & Madau 1999; Nakamoto, Umemura, & Susa 2001; Razoumov et al. 2002; Maselli, 
Ferrara & Ciardi 2003). Some of these codes have the merit that they can include diffuse radiation field (Nakamoto, 



Umemura, & Susa 2001; Razoumov et al. 2002; Maselli, Ferrara & Ciardi 2003). Utilizing this strong point, these 
. . i trials can give answers to the cosmic reionization problem to some extent, however, they do not fit to investigate the 
problem of structure formation because of the dynamical nature of the phenomena and radiative feedback effects on 
star/galaxy formation. Moreover, cosmic reionization problem itself is also coupled with structure formation, since 
the formed stars, galaxies and black holes are the sources of reionization. Therefore we need some assumptions on 
the nature and number of the sources when we try to investigate the reionization of the Universe with this type of 
simulations. 

Another approach to these issues is suggested by Gnedin & Abel (2001), in which paper they propose a new type 
of approximation, called as Optically Thin Variable Eddington Tensor method (OTVET). They solve the moment 
equations of transfer equation on the assumption that the closure relation is given by the Eddington tensor which is 
assessed under the optically thin approximation. The advantage of the approximation is that radiation transfer can be 
coupled with hydrodynamics, since this approximation reduces the computational cost dramatically. Therefore they 
could trace the cosmic reionization history in a self-consistent fashion. 

Recently, Mellema et al. (2005) and Rijkhorst et al. (2005) have constructed fully coupled Radiation Hydrodynamics 
(RHD) codes applying on the cosmic reionization problem. They couple the adoptive mesh refinement hydrodynamics 
codes with adoptive ray-tracing codes. They successfully trace the radiation hydrodynamic expansion of HII region, 
as well as the formation of shadows behind dense clumps. However, their codes still do not include H 2 chemistry and 
transfer of Lyman- Werner band radiation, although they are crucial for the calculation of first structure formation. 

Another branch of radiation hydrodynamic approach was proposed by Kessel-Deynet & Burkcrt (2000), in which 
they show a scheme of ray tracing in Smoothed Particle Hydrodynamics (SPH). SPH is often applied to the simulations 
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on galaxy formation and star formation, because of its Lagrange nature and simplicity. They create grid points on 
light rays from sources to SPH particles. The code has an advantage that the ray tracing is naturally incorporated 
into Lagrangian scheme, since they utilize the neighbor lists of particles to find the grid points. We have parallelized 
their scheme in case light rays from the source can be approximated as parallel (Susa & Umcmura 2004a). However, 
the parallelized code cannot be applied to the problems with multiple sources, that are crucially required for the first 
structure formation. 

In this paper, we describe the newly developed RHD code by ourselves, in which fully coupled hydrodynamics, 
gravity, chemical reactions with H2 are included, as well as radiation transfer of ionizing photons and the Lyman- 
Werner band photons. One of the characteristic feature of this code is that we employ SPH. Radiation transfer part 
of the code is a natural extension of our previous code described in Susa & Umemura (2004a). Thanks to the newly 
developed radiation transfer and parallclization scheme, we can include multiple sources in our new code. Since it is a 
particle code, it also has an advantage that it has the good affinity to GRAPE system (Fukushige, Makino & Kawai 
2005) which can be dedicated to the gravitational force calculations and neighbor search. We also show the outcomes 
of some calculations on the test problems, that are compared with the results of one dimensional simulations. 

This paper is organized as follows. We describe the detail of the code in section 2. Then we show the results of test 
calculations in section 3, and summarize in the final section 4. 

2. Code Description 

In this section, we describe the basic methods of the code, dividing into several parts. First, we describe the method 
to evaluate gravitational force. Then, we explain how we treat hydrodynamics, radiation transfer, and thermal 
processes in order. 



We use a parallel version of Barnes-Hut Tree code (Barnes & Hut 1986) to assess the acceleration owing to 
gravitational force. The core routine of the tree code is originally provided by Jun Makino as a serial version in 
FORTRAN77, which is parallelized by ourselves. 

The computational domain is decomposed by Orthogonal Recursive Bisection (ORB) method (Dubinski 1996). We 
built up the so-called local tree and locally essential tree, then walk along the tree structure to obtain the interaction 
list. The parallelization scheme is summarized in Figure 1. Suppose we have four processor units (PUs) whose 
computational domains are represented by the rectangles. Now we try to assess the acceleration on the particles in 
PU1. In the first step (left panel), Barns-Hut tree structure is constructed on each PU from the particles involved in 
its own domain. This tree is called as a local tree, which is not sufficient to assess the force of gravity on particles 
contained in PU1. Thus we need information from other PUs. After making the local trees, we walk along them in 
other PUs as a "box" which has the same dimension as the computational domain of PU1. As a result, we find the 
tree nodes that are not opened when they interact with the particles in PU1 (upper middle panel). These tree nodes 
data are sent to PU1 as the data of "particles" , since it is not necessary to open the nodes anymore when they are 
used to calculate the gravitational force on the particles in PU1 (right panel). Finally, we can reconstruct the tree 
structure , i.e. locally essential tree, from the particles in PU1 plus the "particles" from other PUs ( lower middle 
panel ). Utilizing this new tree structure, we can assess the force of gravity on the particles in PU1. We can do the 
same operation on other PUs simultaneously, therefore, the acceleration on all the particles can be evaluated. The 
tree structure is also used to obtain the neighbor SPH particle list (see section 2.2), which will also be used in ray 
tracing routine again (see section 2.3). At present, the code is not designed to utilize GRAPE boards, however, will 
be upgraded to a parallel-GRAPE-Tree code. 

2.2. Hydrodynamics 

Hydrodynamics is calculated by SPH scheme, based upon the method summarized in Thacker et al. (2000). The 
local density of the gas is evaluated by following standard convolution method: 

p(xj) = ^2miW(\xi-Xj\,hij) (1) 
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The equation of motion for each SPH particle is given by 



dvi x ^ (Pi Pj 



V< [W(|aj< - |, ft*) + W(|sb< - x,- 1 , fy)] /2. (3) 

Here g i is the gravitational acceleration and Iljj denotes the standard 'Monaghan' artificial viscosity (Monaghan 
1992), 

-Mij(c« + c J -)/2 + 2^ J . 
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The equation of motion is integrated by the standard leap-frog method. Wc also employ the update scheme of SPH 
smoothcning radius hi given in equations (10) and (11) in Thackcr et al. (2000), which adjust the number of neighbor 
particles to nearly constant. We set the number of neighbors to be 100 in our simulations. We also remark that we 
use the tree structure in order to obtain the neighbor list. 

In order to parallelize SPH scheme, parallclization of neighbor search algorithm is inevitable and crucial. The 
method to parallelize neighbor search algorithm is schematically shown in Figure 2. Suppose we have four PUs and 
try to find the neighbor list of the particles in PU1. Left panel shows the first phase of the algorithm, in which local 
tree is made on each PU. Since the neighbor particles would be included in other PUs, local tree is not enough to find 
the complete list of neighbors. In order to obtain the information from other PUs, the particles in the "margin" of PU1 
are found by using the local trees in other PUs. The obtained data of particles are sent to PU1 (upper middle panel). 
The "margins" surrounding the boundary of the domain associated with PU1, are determined by the maximal radius 
among the particles in PU1. The radius is two times the maximal smoothening radius (denoted as 2/i max (PUl) in 
the upper middle panel). Finally, the list of particles which might be in the neighbor of particles in PU1 is completed 
(right panel). Using this list, locally essential tree is reconstructed, which is used to find neighbor particles in PU1 
(lower left panel). We can perform same operations on other PUs simultaneously, hence we obtain the complete list 
of neighbor particles for all particles. 

2.3. Ray Tracing 

It would be possible to solve full three dimensional radiation hydrodynamic problem in the next decade, but it is not 
possible at present. Actually, Nakamoto, Umemura, & Susa (2001) have performed fully three dimensional simulation 
of radiation transfer for ionizing photons including the effects of diffuse radiation. However, the computational cost 
was so expensive that it cannot be used directly for the radiation hydrodynamic problems. 

In our code, so called on-the-spot approximation (Spitzcr 1978) is employed. We solve the transfer of ionizing 
photons directly from the source, whereas we do not solve the transfer of diffuse photons. Instead, it is assumed that 
the recombination photons are absorbed in the very neighbour of the spatial position where they are emitted. Thus, 
the radiation transfer equation for intensity I v at frequency v reduces to a very simple equation: 

-r- = - n mo,Jv, (6) 
ar 

where the re-emission term is not included, since it is taken into account in the reduced recombination rate. This 
transfer equation (6) itself is easily integrated to find I v oc exp(— r„), where the optical depth r„ is defined as: 

n m a v dr, 



o 

where r is the distance from the source, njji is the HI density, and <7„ denotes the photoionization cross section. The 
on-the-spot approximation reduces the computational cost drastically, if the number of the sources is much smaller 
than the number of SPH particles. Wc also note that the frequency dependence of optical depth is originated from cr„, 
which dependence is already known. Therefore, all we need is the optical depth at a certain frequency from in order 
to assess the optical depth at arbitrary frequency. 

It is not trivial to assess the optical depth in SPH scheme. Kessel-Deynet & Burkert (2000) have proposed a 
scheme for SPH, which utilize the neighbour lists of SPH particles to create grid points on the rays from the source 
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to SPH particles. In fact, we have parallelized the scheme to use in our previous numerical simulations on the 
photoevaporation of proto-dwarf galaxies(Susa & Umemura 2004a; Susa & Umemura 2004b). However, in case the 
number of SPH particles are very large, computational cost of grid points creation on the light ray is not negligible. 
Moreover, the parallelized version of the code cannot be applied to the problems including multiple sources. Thus, we 
use the similar method which is slightly different from the original Kessel-Deynet's scheme. 

In our new scheme, we do not create so many grid points on the light ray. Instead, we just create one grid point 
per one SPH particle in its neighbor. The method is described schematically in Figure 3. Suppose that we are trying 
to assess the optical depth of the particle "0" from two sources "SI" (to<— si) and "S2" (t6«_s2)- As the first step, we 
make a thorough search on the neighbor list of the particle "0"to find the "upstream" particles which are closest to 
the light rays. The "upstream" particle are found by a simple criterion, that the angles 9\ and O2 are the smallest. In 
the case of Figure 3, particles "4" and "2" are chosen as the "upstream" particles. 

Next, we create the grids on the rays. The grid points on the rays are found as the intersections (marked as wedge- 
shapes in the figure) of the light rays and the sphere with the radius which equals to the distance between the sources 
and the "upstream" particles. The physical variables on the particles are mapped onto the intersection points from 
the "upstream" particles. Using the mapped quantities, the differential optical depths between the intersections and 
particle "0" (Ato^si and Ato<_ S2) are evaluated by simple trapezium formula. Finally, we obtain the optical depths 
as 

tq*-si = t^si + At ^si (7) 

T 0^S2 — T 2 ^S2 + At 0+ _ S2 (8) 

where r^gi and T2<— 52 denote the optical depth form SI to particle 4 and the one from S2 to particle 2. Clearly, 
we need the optical depth of particles 4 and 2 before we assess tq^si and to<-S2- Therefore, we have to evaluate the 
optical depth of all particles in reverse order of the distance from the source. 

If we try to parallelize the code, we need communications of optical depth between the processors, since we have to 
assess the optical depth of each particles in order in the computational domain decomposed by ORB( see section 2.1). 
Consequently, we have to wait to assess the optical depth until the information from "upstream" processors arrive. 
Because of this waiting time, the parallclization efficiency becomes very poor. 

In order to overcome this problem, we incorporate two tricks. First, we use Multiple Wave Front (MWF) technique 
which was originally developed by Nakamoto, Umemura, & Susa (2001). Following their method, we can overlap the 
calculations of optical depth for multiple sources. Since the number of "waiting" processors is reduced for multiple 
sources, the parallclization efficiency becomes the better the number of sources gets larger. Second, we use the scheme 
proposed by Hieneman et al. (2005), which accelerates MWF for small number of sources. In their method, processors 
have to wait only while the "upstream" processors are calculating the optical depth of particles on the boundary of 
computational domain( Sec Figure 4). Therefore, the "waiting" time is greatly reduced, which also helps to improve 
the parallclization efficiency. 

Figure 4 shows an example of propagation of transactions in computational domains. Suppose we have two sources 
(denoted by two stars) and four PUs. Each panel shows a step of transaction, which is composed of calculations and 
communications. In the shaded area, the optical depths of SPH particles are calculated, and the arrows between the 
rectangles show the required communications. We have 5 steps in this example, which guarantees that the information 
from upstream PUs have arrived before the calculations of optical depths in its own processor. The PUs marked by 
numbers [T] and/or 2 denote the one in which the calculation for sources [T] and/or 2 have finished. In the first 
panel, all four PUs are calculating the optical depth for both of the sources simultaneously, assuming the optical 
depths at the boundary particles are zero. The parallclization efficiency is very close to unity in this first step. In 
the second panel, the calculated optical depths are sent to the neighbor downstream PUs. In the third panel, the 
optical depths at the opposite side of the boundary are assessed by just summing up the received optical depths at 
the boundary and the optical depths calculated at the first step in the attached PUs. The calculated boundary data 
are sent to the downstream again (fourth panel). After receiving all the required data at the boundary, the optical 
depths in the inner part are calculated by the sum of the received boundary value and the optical depth evaluated at 
the first step. 

Here we have to remark that this parallelization scheme could introduce very slight difference among the numerical 
results of various number of processors. In the above scheme, we have to give orders among the processors, otherwise 
we cannot define the "upstream" or "downstream" processors. This means when we search the "upstream particle" 
for an SPH particle, one cannot choose the particles located in "downstream" processors. Consequently, especially in 
case both of the source and the SPH particle under consideration are located near the boundary of the processors, such 
ordering constraint could give different choice of the "upstream" particle from the one obtained in serial calculation, 
since there are no such additional constraints for the serial case. Thus the numerical results obtained from different 
number of processors are not exactly the same. 
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2.4- Chemical Reactions and Energy Equations 

The non-equilibrium chemistry and radiative cooling for primordial gas are calculated by the code developed by 
Susa & Kitayama (2000), in which H2 cooling and reaction rates are taken from Galli & Palla (1998). Atomic cooling 
by hydrogen is also taken into account, utilizing the fitting formula given in Fukugita & Kawasaki (1994). Helium 
is not included in our code at present, but it will be in the near future (see Appendix 1). Since we are primarily 
interested in the formation of first objects and their feedback processes, we neglect the cooling by metals and dusts in 
the present code. We take into consideration the chemistry for H, H + , H , H2, Hj and e~. 

The photoionization rate of HI and the photoheating rate for each SPH particle labelled as i are respectively given 



Here nm(i) represents the number density of neutral hydrogen at z-th particle, a v is the photoionization cross section, 
which is taken from the table in Shapiro & Kang (1987). The frequency at the Lyman limit is denoted by and 
ft is the solid angle. I v {i) is the intensity of the ultraviolet radiation that irradiate z-th particle, which is obtained 
by the scheme described in section 2.3. Remark that the equations (9) and (10) denote the general formulae of those 
rates at arbitrary spatial position, omitting the suffix i. 

In case the optical depth for ionizing photon for a single SPH particle is moderate, the equations (9) and (10) are 
valid. If the optical depth becomes much larger than unity, however, those expressions could lead to essentially zero 
ionization and heating rates because the equations do not conserve the number of photons numerically. Thus, we need 
something like photon conserving method (Kessel-Deynet & Burkert 2000; Abel, Norman & Madau 1999) in order to 
avoid this difficulty. 

For the purpose to find better formulae for optically thick regime, we combine equations (9), (10) without the suffix 
i and (6) to rewrite the ionization rate and the photoheating rate as follows: 



where L* denotes the intrinsic luminosity of the source and r is the distance from the source. Here we also use the 
formal solution of equation (6) and assume that size of the source is much smaller than r. 
Now we are ready to derive the better formula, that are the "volume averaged" rates: 



by 
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Here is the distance between the source and i-th particle, Ar^ denotes the spatial step of ray tracing integration, 
which satisfy 

r^n-An/2, (17) 

where j denotes the label of the particle located at "upstream" of i-th particle, which is found by the algorithm 
described in section 2.3. 

The above formula has an important advantage that the propagation of ionization front is properly traced even for 
a large particle separation with optical depth greater than unity. However, in the neighbor of the source, equations 
(9) and (10) might give better value than equations (13) and (14) do because of the poor spatial resolution and high 
ionization degree. Thus, we employ following formula for ionization and photoheating rates, in which the rates are 
switched to one another depending on the local optical depth: 

fc -W- 1 + AT . /Arcr + 1 + ATi/ATcr > U»J 

i -w- 1+Ar . /Arcr + 1+Ar . /Arcr • ^) 

Here Ar cr is a constant at which the switching occurs, that is 10 -3 in our simulations. At^ is the local optical depth 
defined as: 

= 2 ( n Hi(j) +n m (i))a VL Ar t . 

We also remark that the integral in frequency space in equations (15) and (16) can be performed in advance of the 
simulations. As a result, $i and <f>2 are found to be the functions of the optical depth only at the Lyman limit, 
since we already know the frequency dependence of the cross section t„, which is proportional to er„. Therefore we 
can create tables of $i and $2 as functions of t 1/l , or we have analytic formula for some particular cases (Susa & 
Umemura 2000; Susa & Kitayama 2000; Nakamoto, Umemura, & Susa 2001). Consequently, we need that optical 
depth just at the Lyman-limit by ray tracing. We also note that this formulation for pure hydrogen gas can be 
extended to the hydrogen + helium gas ( Appendix 1 ), although it has not been implemented yet. 

The H2 photodissociation rate is also evaluated by similar method. In order to assess the photodissociation rate, 
the transfer equation of Lyman- Werner(LW) radiation is necessary. However, the frequency dependent line transfer 
of LW band takes too much computational cost. Thus, we assume the "self-shielding function" derived by Draine & 
Bertoldi (1996), which gives the photodissociation rate of H2 as 



r ) J V 10 



where 



/sh(£) 



1, x< 1 

x" 3 / 4 , x > 1 



Here Ah 2 is the column density of hydrogen molecules measured from the source star, k^l is the unshielded photodis- 
sociation rate in the neighbor of the source at r = Tq. We also multiply the geometrical dimming factor, (r /r) 2 . The 
simplest method is to use equation (20) directly, which works well in the case AAh 2 J$ 10 14 cm~ 2 is satisfied. Here AAh 2 
denotes the H2 column density of a single spatial grid generated along the light ray, i.e. AAh 2 = \ ( n H 2 U) + n a 2 (*)) ^ r i- 
However, for AAjj 2 3> 10 14 cm~ 2 , the photodissociation rate is significantly reduced by the absorption in LW band 
while LW photons are passing through only a single grid. This could delay the propagation of H2 dissociation front 
because the significant amount of dissociation photons are absorbed in a single grid without being used to dissociate 
H2. In order to overcome this problem, we take similar approach to that in photoionization problem. In the case of 
photoionization, we use a sort of 'photon conserving' method by integrating the photoionization rate over a single grid. 
Similarly, we integrate the photodissociation rate over a single grid, and assess the volume averaged photodissociation 
rate as follows: 
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where denote the H2 column density at 



: ± Arj/2, respectively. Note that the factor fc^/(iH 2 



can be factorized 



outside the integral since is proportional to nn 2 - The last integral in the above equation is easily performed 
analytically. Thus, we obtain the expression of volume averaged photodissociation rate in our scheme. Following 
the method in the case of photoionization. we also employ the switching scheme from/to the optically thin to/from 
optically thick regime described in equation (19) (switches from/to expression in equation (20) to/from equation (21)). 
Now we are ready to integrate the rate equations and energy equations implicitly: 
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(22) 
(23) 

(24) 



Here the suffix on the right shoulders of the characters denote the labels of the grid in time coordinate. At is the 
time step, which is given by the algorithm discussed in section 2.5. yi and e represent the z-th chemical species and 
internal energy of the gas. hi, T and A denote the reaction rate of i-th species, photohcating rate, and cooling rate, all 
of which are assessed at (n+ l)-th time step, while the last term in the second equation denotes the adiabatic term, 
which is evaluated at n-th time step. In other words, only the adiabatic term is taken into account in an explicit form. 
Using these equations, we find the solutions {y| n+1 ' > } and e^ n+1 ^> for all particles by iterative operations. 

2.5. Time Step Control 

It is possible to use individual time step control depending on the local density in hydrodynamic simulations. 
However, in radiation hydrodynamic simulations, the justification of the scheme is unknown and uncertain since the 
effects of radiation field is quite global. Therefore, we use synchronous time steps for all particles. 

As a first step, At is set as the minimal value among the Courant conditions and the time scales of accelerations 
for all SPH particles: 



At 



hydro 



max(c sl ,ii 1 ) ' V Vi 



for all i. 



(25) 



Afterwards, we try to find solutions of the rate equations and energy equations for all particles by the implicit time 
integration solver. If we fail to find the solutions, i.e. the convergence is very slow, then we divide the time step by 
two, and try to find the solutions again. We continue this operation until we find the solutions within reasonable 
iteration steps (nominally 20 ). After finding the solution for energy equations and chemical reaction equations, we 
update the particle positions and velocity by explicit leap-flog time integration. 



3. Results of Test Calculations 



We perform various tests with the code, and will show the results comparing with the numerical solutions from one 
dimensional radiation hydrodynamic simulations. We use 8 nodes ( 16 Xeon processors with gigabit ethernet) and 
1048576 SPH particles for the tests unless stated. 

3.1. Expanding HII Region in Uniform Media 

In this test, we put a single source at the center of the uniform gas cloud and trace the propagation of ionization 
front. The initial number density of the gas is tih = 0.01cm -3 , and the temperature is T = 3 x 10 2 K. The ionizing 
photon luminosity of the source is S = 1.33 x 10 50 s -1 and the spectrum is black body with T* = 9.92 x 10 4 K, that are 
the parameters of POPIII stars with 120M Q (Baraffe, Heger & Woosley 2001 ). 

Figure 5 shows the propagation of ionization front (rj ). Numerical results from the code are shown by red dots, 
whereas the dotted line denotes the results from the one dimensional Piecewise Parabolic Method (PPM) code (Colclla 
& Woodward 1984). The results agree well with each other. Note that 1D-PPM code is also tested by the comparison 
with analytic solutions and results from the similar code developed by Tetsu Kitayama (Kitayama et al. 2004), and 
the two results agree very well with each other. 

Figure 6 shows the spatial distributions of physical quantities (HI fraction , temperature, density, H 2 fraction ) 
at different snapshots ( t = 10 7 yr, 10 8 yr, 10 9 yr). The results are also compared with the plots of one dimensional 
simulation. Slight scatter and deviation is found in the spatial distribution, especially at later epoch. However, the 
agreement between the two results are acceptable. 
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3.2. Expanding HII Region in p oc r 2 Density Profile 

In this test, density distribution is different from the previous test. We assume "core-envelope" structure, where the 
uniform central core density njj = 100cm -3 whose radius is lOpc. We also assume the density of the outer envelope 
that decline as oc r~ 2 , which is typical of the slope in collapsing prestellar gas cloud. Since the Sromgrcn radius in 
the core is slightly smaller than the core radius, initial R-type ionization front is stopped in the core. Afterwards, the 
ionization front changes to D-type. While the D-typc front propagating in the envelope, the front type becomes R-type 
again, because the the slope is steeper than the critical case where p oc r -15 (Franco, Tcnorio-Tagle & Bodcnhcimcr 
1990). 

The results are compared with the 1-D results again in Figures. 7 and 8. Three different lines shown in Figure 8 
correspond to t — 4 x 10 5 yr, 10 6 yr,4 x 10 6 yr, respectively. At the final epoch, the type of the ionization front is being 
changed to R-type from D-type. Again, we find good agreement between the two results, although we have slight 
difference at the final output. At the final epoch, the propagation of the shock front is slightly delayed, which might 
be cased by the error in evaluating the optical depth in front of the density peak. In fact, the width of the density 
peak at the D-front is broader than that of the ID result. Consequently, the ionized region could be slightly smaller 
in SPH simulation, which can introduce the delay of propagation time of the shock front. 

3.3. Expanding HII Region with Multiple Sources 

We also show the HI and density structures that are created by multiple sources. We perform RHD simulations 
with randomly distributed ten sources in the uniform spherical cloud (nn = 1cm -3 , R=l kpc). All of the sources are 
assumed to have the same effective temperature and the luminosity as in the previous test. The number of particles 
used in this simulation is half of the previous two tests, i.e. Asph = 524288. 

The panels in Figure 9 show the time evolution of the density (left column) and HI fraction (right column) distri- 
butions on a certain spatial slice of the cubic region that are hollowed out from the spherical simulated cloud. 

In the first phase, all HII regions expand as R-type and and they reach Stromgren radius (from top panels to middle 
panels). Afterwards, the photoheated gas starts to expand dynamically, and dense shells are formed surrounding the 
sources. The shells collide with each other, and web of dense shells is formed (bottom panels). We use this calculation 
as the benchmark of our code in the next section. 

3.4- Speedup by parallelization 

We assess the speedup of parallelization by performing simulations with multiple sources in uniform media described 
in section 3.3. The initial setup of the simulations are the same as in section 3.3. We use 2, 4, 8, 16 and 32 processors 
(1, 2, 4, 8 and 16 nodes) for the same setup, and compare the clock time per single time step with each other. 

We use the first model of FIRST cluster (consists of 16 nodes with 32 processors) in University of Tsukuba for 
these test runs. The newly being developed PC cluster "FIRST" is designed to elucidate the origin of first generation 
objects in the Universe through large-scale simulations. The cluster will consist of 256 nodes ( each node has dual 
Xeon processors ) connected by two dimensional hyper-crossbar quad-gigabit network utilizing trunk technology. The 
most striking feature of the system is that each node has small GRAPE6 PCI-X board called Brade-GRAPE. The 
Brade-GRAPE has four GRAPE6 chips per board (thus 1/8 processors of a full board). The board has basically the 
same function as GRAPE-6A (Fukushige, Makino & Kawai 2005), but it has an advantage that it can be installed in 
2U server unit of PC clusters due to the short height of its heat sink. 

Figure 10 shows the speedup of the performance. The vertical axis denotes the clock time by single node run (two 
processors) divided by the time for -/V no dc nodes, and the horizontal axis shows the number of nodes. The dotted 
line represents the case of ideal speedup, and the symbols correspond to the runs with various number of sources 

(Source = 2, 10,20). 

Some of present results are slightly better than ideal case (e.g. iV no dc = 8, A sourco = 20 case). The chief reason of 
this super linearity is the different convergence speed of implicit solver for different number of nodes. The numerical 
calculations between the runs with different number of nodes are not exactly the same ( see section 2.3 ). Consequently, 
the number of time steps vary from runs to runs and the convergence speed for each time step is also not identical. 
Thus, such different convergence property could lead to the super linear speed up. Apart from the fluctuation due to 
this unevenness of time steps, parallelization efficiency is quite good as far as the number of nodes are smaller than 

16 for Source > 10. 

For (iV no de, iVsourcc) = (16, 2) case, the speedup is not as good as those for other runs. This run include only 2 
sources, which number is much smaller than the number of nodes. Thus the 'waiting' time in the ray tracing routine 
could emerge. In fact, for larger number of sources, we do not observe such reduction of parallelization efficiency. 
However, this result also means when we increase the number of nodes to ~ 100, similar phenomenon is expected with 
-^source ~ 10. Therefore, we have to keep in mind this property when we run the code with much larger number of 
nodes in the future after the completion of full FIRST cluster. 
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4. Summary 

Wc have constructed the new code for radiation hydrodynamics designed to be applied to the feedback effects of first 
generation stars/galaxies. The code can deal with multiple sources, and it can solve the transfer of ionizing photons 
and photodissociating photons. 

The code is already parallelized, and almost ready to be installed in the newly developed full size huge PC cluster 
"FIRST" in University of Tsukuba. We compare the numerical results with one dimensional calculations, and find 
the good agreements. 
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Appendix 1. Volume averaged photoionization rate with helium 

In this section, we show the basic idea of the volume averaged photoionization/photoheating rate for H/He gas. 
It could be useful to the readers although we have not implemented yet. First of all, we have the radiation transfer 
equation assuming the pure absorption (i.e. on-the-spot approximation): 

= -Kia™ + n Hcl a™ + n^u^ 11 )!, (Al) 

dr 

where and nx denote the photoionization cross section and number density of X species, respectively. The 
photoionization rates for HI, Hel and Hell are 

C = nm i i ^<r™dndv, (A2) 



*g? = »Hei/ Hr-o?*dM», (A3) 



kgf = n Ho ii / / i^a« eLL dndv, (A4) 



Here vl^lylcI and ^lhoII denote the Lyman limit frequency of HI, Hel, HcII respectively. 

Combining cquation(Al) with cquations(A2)-(A4), and perform the integration in solid angle, wc have 

hi = i r< ^ ;B pN 

Jv L ("<?■) tot dr hv 



4nr 2 J VLnel (ncr)tot dr hv 

' f «^<£!!*H->*,, (A7) 

47rr 2 J ULHolI (ncr)tot dr hv 

where 

(ncr)tot = nm<J™ + «HoICT^ cI + nn c ua^ eU 
(na)totdr 

o 

Then we assume that the ratio such as rix^ / '(^"Otot m equations (A5)-(A7) are independent of r within the single 
grid, which is a regular assumption for finite difference scheme. Therefore we can regard the ratio as constants when 
we integrate the equation within a single grid. The volume averaged rates are as follows: 

- M _ 3 $HV,-Ar a /2)-$«V t + Ar t /2) 
ion An 3rf + Ar?/4 ' 1 1 

3 $**\ ri - An/2) -$?°\ ri + An/2) 
ion An 3rf + Arf/4 ' 1 ' 
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3 <f>? oII (n - An/2) - gf cII ( r, + Ar,/2) 
Ari 3rf + Arf/4 



e H = 
where 



$Hi (r) = f rpna^LteM-r.) (An) 
J VL (ncr)tot 47r /it/ 

$ Hei (r)= r np^^exp^ 

•/^Hel ( nCr )t°t 47T ZlZ-' 

$ He„ (r)= r nHen^^expC-r,)^ 

The photohcating rates are evaluated by exactly the same way as above. The volume averaged photoheating rates 
are 

~FhT _ 3 *™( ri - An/2) -*™( ri + An/2) 

1 ion — A„ Q„2 , A„2//l ' l Ai4 i 



An 


3r? + Ar?/4 


3 ^ eI (n- 


Ar i /2)-$H cI (r, + Ar i /2) 


Ar, 


3r? + Arf/4 


3 $$ eII (ri- 


- Ar,/2) - $Hdi (rj + Ar . /2) 



Ar, 3r| + Arf/4 

3 ^ cII (r l -Ar l /2)-$f oII i 
An~ 3rf + Arf/4 



i ion o„2 T A„2 71 ' l Ai0 i 



where 

J VL (na)tot 47r /iiv 
, H ci/ \ f°° «Hcio-? cI exp(-T„) 

*2 (»■)=/ -7 — — r '-(hv-his L Hei)dv, A18 

$ 2 (r = / — - — [hv - hvLHeil) dv. (A19) 

It is also worth to point out that the functions <I>* and can be assessed only by the optical depth at three 
frequencies z^lhcI and z^Hcii), although the expressions (All) - (A13) and (A17) - (A19) include frequency 
dependent optical depth t v (Nakamoto, Umemura, & Susa 2001). Consequently, we can create the tables of ionization 
rates and heating rates as functions of optical depth and the ratio n-xc^/ (na)tot at Lyman limits (l>l, vlubI and z/LHeii)- 
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Fig. 1. Parallel tree scheme to assess the force of gravity on the particles included in a processor(PUl) is schematically 
shown. Left panel shows the initial phase of the algorithm, in which local tree is made on each processor. In the next 
step, the tree nodes which can interact the particles in PU1 are found by walking along the local tree. The data of the 
nodes are sent to PU1 as "particles" that do not have inner structure (upper middle panel). Finally, we have the com- 
plete list of particles which can interact with the particles in PUl(right panel). Utilizing this particle list, we can assess 
the force of gravity on the particles in PU1 by building up and walking down the locally essential tree (lower middle). 
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Fig. 2. Employed neighbor search algorithm on parallel computer system to obtain the neighbour list of a certain pro- 
cessor (PU1) is shown. Left panel shows the initial phase of the algorithm, in which local tree is made on each 
processor. In the next step, the data of particles in the "margin" of PU1 are sent to PU1 (upper middle). 
Finally, the list of particles which can be in the neighbor of particles in PU1 is completed (right panel). Using 
this list, locally essential tree is re-constructed, which is used to find neighbor particles in PU1 ( lower middle ). 
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Fig. 3. Ray tracing method for multiple sources are schematically shown. Numbers in circles represent the ID of SPH par- 
ticles in the neighbor of SPH particle "0". SI and S2 represent the position of two sources. Two wedge-shapes show 
the position on which the optical depth from the sources are projected from corresponding particles (in this case 2 and 4). 
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Fig. 4. Order of computations and communications for ray tracing on parallel machine is shown schematically for an example. In 
this example, the computational domain is decomposed into 2D, and consists of 4 nodes. The calculations and communications 
proceed along the direction of small open arrows. The position of two sources are shown by star-shaped symbols. The shaded 
region on the nodes are occupied by computation, and the shaded arrows represents the direction of the present communications. 
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Fig. 5. Propagation of ionization front in uniform media is shown. Initial density and temperature is njj = 0.01cm 3 ,T = 3 X 10 2 K. 
The ionizing photon luminosity of the source is 5 = 1.33 X 10 50 s — 1 and the spectrum is black body with 
T* = 9.92 X 10 4 K. Horizontal axis shows the time after the ignition of central star, and the vertical axis shows the 
position of ionization front, defined as the radius at which the fraction of HI is 0.5. Red dots denote the re- 
sults from the SPH code, whereas the dotted line denotes the ionization front position from the ID simulation. 
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Fig. 6. Spatial distribution of physical quantities at three time snapshots (t = 10 7 , 10 8 , 10 9 yr) are shown for the run shown 
in Fig. 5. Upper panels show the density and temperature, and the lower panels show the HI fraction and H2 frac- 
tion. Horizontal axes denote the distance from the source. The dotted lines denote the results from the ID simulation. 
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Same as Fig. 6, except that the initial density distribution is core-envelope structure, discussed in section3.2 




Fig. 9. Spatial slice of the multi-source run in section 3.3 is shown. Left panels show the evolution of den- 
sity, and the ionization structures are shown in the right panels. The color legends arc shown at the bottom. 
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Fig. 10. Speedup of calculations by parallclization is shown. Vertical axis denotes the clock time for one node (two pro- 
cessors) divided by the clock time with AT no< j e nodes. Horizontal axis shows the number of nodes. The dotted line 
represents the case of ideal speedup. Symbols connected by the solid lines are the results with different number of 
sources (N aonTce ): open circles show the results for -/V aource = 20, vertices for N BOUICe = 10, and triangles for iV aource = 2. 



